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-vi . Algorithms for simultaneous shrinkage and selection in regression and 

classification provide attractive solutions to knotty old statistical challenges. 

r , '■ Nevertheless, as far as we can tell, Tibshirani's Lasso algorithm has had little 

p^ I impact on statistical practice. Two particular reasons for this may be the 

• ■ relative inefficiency of the original Lasso algorithm and the relative com- 

"^ . plexity of more recent Lasso algorithms [e.g., Osborne, Presnell and Turlach 

2 I (2000)]. Efron, Hastie, Johnstone and Tibshirani have provided an efficient, 

G ■ simple algorithm for the Lasso as well as algorithms for stagewise regres- 

i sion and the new least angle regression. As such this paper is an important 

^-H ■ contribution to statistical computing. 

> '. 

\^ I 1. Predictive performance. The authors say little about predictive per- 

Tij- ' formance issues. Li our work, however, the relative out-of-sample predictive 

^O , performance of LARS, Lasso and Forward Stagewise (and variants thereof) 

zi I takes center stage. Interesting connections exist between boosting and stage- 

(^ ■ wise algorithms so predictive comparisons with boosting are also of interest. 

"^ I The authors present a simple Cp statistic for LARS. In practice, a cross- 

+-^ ' validation (CV) type approach for selecting the degree of shrinkage, while 

rH . computationally more expensive, may lead to better predictions. We consid- 

H I ered this using the LARS software. Here we report results for the authors' 

K^ ■ diabetes data, the Boston housing data and the Servo data from the UCI 

K> , Machine Learning Repository. Specifically, we held out 10% of the data and 

^ ' chose the shrinkage level using either Cp or nine-fold CV using 90% of the 

data. Then we estimated mean square error (MSE) on the 10% hold-out 

sample. Table 1 shows the results for main-effects models. 

Table 1 exhibits two particular characteristics. First, as expected. Stage- 
wise, LARS and Lasso perform similarly. Second, Cp performs as well as 
cross-validation; if this holds up more generally, larger-scale applications 
will want to use Cp to select the degree of shrinkage. 
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Table 1 

Stagewise, LARS and Lasso mean square error predictive performance, 

comparing cross-validation with Cp 





Diabetes 




Boston 




Servo 




CV 


Cp 


CV 


Cp 


CV Cp 


Stagewise 

LARS 

Lasso 


3083 
3080 
3083 


3082 
3083 
3082 


Stagewise 

LARS 

Lasso 


25.7 
25.5 
25.8 


25.8 
25.4 
25.7 


Stagewise 

LARS 

Lasso 


1.33 1.32 

1.33 1.30 

1.34 1.31 



Table 2 presents a reanalysis of the same three datasets but now con- 
sidering five different models: least squares; LARS using cross-validation to 
select the coefficients; LARS using Cp to select the coefficients and allowing 
for two-way interactions; least squares boosting fitting only main effects; 
least squares boosting allowing for two-way interactions. Again we used the 
authors' LARS software and, for the boosting results, the gbm package in 
R [Ridgeway (2003)]. We evaluated all the models using the same cross- 
validation group assignments. 

A plain linear model provides the best out-of-sample predictive perfor- 
mance for the diabetes dataset. By contrast, the Boston housing and Servo 
data exhibit more complex structure and models incorporating higher-order 
structure do a better job. 

While no general conclusions can emerge from such a limited analysis, 
LARS seems to be competitive with these particular alternatives. We note, 
however, that for the Boston housing and Servo datasets Breiman (2001) 
reports substantially better predictive performance using random forests. 



Table 2 

Predictive performance of competing methods: LM is a main-effects linear model 

with least squares fitting; LARS is least angle regression with main effects and 

CV shrinkage selection; LARS two-way Cp is least angle regression with main 

effects and all two-way interactions, shrinkage selection via Cp; GBM additive 

and GBM two-way use least squares boosting, the former using main effects only, 

the latter using main effects and all two-way interactions; MSB is mean square 

error on a 10% holdout sample; MAD is mean absolute deviation 





Diabetes 


Boston 


S< 


3rvo 




MSB 


MAD 


MSB 


MAD 


MSB 


MAD 


LM 


3000 


44.2 


23.8 


3.40 


1.28 


0.91 


LARS 


3087 


45.4 


24.7 


3.53 


1.33 


0.95 


LARS two-way Cp 


3090 


45.1 


14.2 


2.58 


0.93 


0.60 


GBM additive 


3198 


46.7 


16.5 


2.75 


0.90 


0.65 


GBM two-way 


3185 


46.8 


14.1 


2.52 


0.80 


0.60 
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2. Extensions to generalized linear models. The minimal computational 
complexity of LARS derives largely from the squared error loss function. Ap- 
plying LARS-type strategies to models with nonlinear loss functions will re- 
quire some form of approximation. Here we consider LARS-type algorithms 
for logistic regression. 

Consider the logistic log-likelihood for a regression function /(x) which 
will be linear in x: 

TV 

(2.1) i{f) = Y, yJi^,) - log(l + exp(/(x,))). 

i=l 

We can initialize /(x) = log(y/(l — y))- For some a we wish to find the co- 
variate Xj that offers the greatest improvement in the logistic log-likelihood, 
^(/(x) +Xya). To find this Xj we can compute the directional derivative for 
each j and choose the maximum, 

'^-£(/(x)+x*a) 



(2.2) j* = argmax 



3 



(2.3) = argmax 

j 



da ^'' ^ ^ ' -^ 



l+exp(-/(x)). 

Note that as with LARS this is the covariate that is most highly correlated 
with the residuals. The selected covariate is the first member of the active 
set, A. For a small enough (2.3) implies 

(2.4) (s,*x,-. - s,x,)* (y - , ,, , J — - ) > 

^ ^ ^^ ^ ^ ^^ y l+exp(-/(x)-x*.a)y - 

for all j G A'~" , where Sj indicates the sign of the correlation as in the LARS 
development. Choosing a to have the largest magnitude while maintaining 
the constraint in (2.4) involves a nonlinear optimization. However, lineariz- 
ing (2.4) yields a fairly simple approximate solution. If X2 is the variable 
with the second largest correlation with the residual, then 

(25) ^_ (sj*Xj*-S2X2)*(y-p(x)) 



(Sj.Xj. - S2X2)*(p(x)(l -p(x))Xj 



The algorithm may need to iterate (2.5) to obtain the exact optimal a. 
Similar logic yields an algorithm for the full solution. 

We simulated N = 1000 observations with 10 independent normal co- 
variates Xj ~ A''io(0,I) with outcomes Yi '--^ Bern(l/(1 +exp(— x*/3))), where 
P ^ A'^io(0, 1). Figure 1 shows a comparison of the coefficient estimates using 
Forward Stagewise and the Least Angle method of estimating coefficients, 
the final estimates arriving at the MLE. While the paper presents LARS 
for squared error problems, the Least Angle approach seems applicable to a 
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wider family of models. However, an out-of-sample evaluation of predictive 
performance is essential to assess the utility of such a modeling strategy. 

Specifically for the Lasso, one alternative strategy for logistic regression 
is to use a quadratic approximation for the log-likelihood. Consider the 
Bayesian version of Lasso with hyperparameter 7 (i.e., the penalized rather 
than constrained version of Lasso): 



log/(/3|2/i,...,yn) 

n 

cx^log(yiA(x,/3) + (1 -yi)(l - A(x,/3))) + c?log 



i=l 



.1/2 



l"'Y.\Pr 



i=l 



Y.aMil^f + 6i(xi/3) + Q + dlog 



u=l 



7 



1/2 



2 



y/^EiA 



i=l 



where A denotes the logistic link, d is the dimension of /3 and aj , hi and 
Cj are Taylor coefficients. Fu's elegant coordinatewise "Shooting algorithm" 
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Fig. 1. Comparison of coefficient estimation for Forward Stagewise and Least Angle 
Logistic Regression. 
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[Fu (1998)], can optimize this target starting from either the least squares 
solution or from zero. In our experience the shooting algorithm converges 
rapidly. 
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